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(54) Format conversion of graphical image data words 



(57) Data components in a first, processing format, 
each of which includes a selected portion which repre- 
sents the data component in a second, display format, 
are merged to form an interleaved data word in which 
the selected portions of data components are grouped. 
For example, two pixel components, which are repre- 
sented in a two-byte format in which the least significant 
byte represents each pixel component in a one-byte for- 
mat, are merged to form a four-byte interleaved word in 
which the first two bytes are the most significant bytes of 
the pixel components in the two-byte format and in' 
which the next two bytes are the least significant bytes 
of the pixel components in the two-byte format. Since 
the least significant bytes of the pixel components in the 
two-byte format are equivalent to the two pixel compo- 
nents represented in the one-byte format, the two pixel 
components are effectively converted to a two-byte 
word in which each pixel component is represented in 
the one-byte format. A merge computer instruction is 
capable of interleaving respective bytes of two four-byte 
words and is used once to group most significant bytes 
and least significant bytes of first and second pixel com- 
ponents represented in a two-byte format and to group 
most significant bytes and least significant bytes of third 
and fourth pixel components represented in the two- 
byte format and a second time to group the most signif- 
icant bytes of tie first, second, third, and fourth pixel 
components and to group the least significant bytes of 
the first, second, third, and fourth pixel components. 
The least significant bytes of the first, second, third, and 
fourth pixel components represent the first, second, 
third, and fourth pixel components in a one-byte format 
and are stored as the respective pixel components in 
the one-byte format. Thus, four pixel components are 
converted from a two-byte format to a one-byte format 



using only two computer instructions. Eight contiguous 
bytes can be accessed in a single read computer 
instruction or a single write computer instruction. 
Accordingly, two read computer instructions retrieve 
eight pixel components in a two-byte format. The eight 
pixel components are converted to a one-byte format 
using four merge computer instructions and are stored 
in memory using a single write computer instruction. 
Accordingly, a four-band graphical image which 
includes one million pixels can be converted from a two- 
byte processing format to a one-byte display format 
using one million read computer instructions, one-half 
million merge computer instructions, and one-half mil- 
lion write computer instructions. 
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Description 

FIELD OF THE INVENTION 

The present invention relates to graphical image 5 
processing in a computer system and, in particular, to a 
particularly efficient mechanism for recasting pixels of a 
graphical image which are represented in a 16-bit for- 
mat into pixels represented in an 8-bit format. 

BACKGROUND OF THE INVENTION 

In many computer graphics system in use today, 
individual picture elements, i.e., pixels, of a graphical 
image are stored in a particular format. For example, 
single-band grayscale pixels are commonly stored as 
unsigned eight-bit integers, and 4-band color pixels are 
commonly stored as four contiguous unsigned eight-bit 
integers. Graphical images, which are generated using 
data representing a model and a computer process 
such as a three-dimensional modeling system, fre- 
quently involve complex numerical calculations. It is 
common for a graphical image to be rendered while pix- 
els of the graphical image are represented in a format 
which provides greater precision that the particular for- 
mat in which displayed pixels are stored. For example, 
in a computer graphics system in which each band of a 
displayed pixel is stored as an eight-bit unsigned inte- 
ger, each band of a pixel is frequently stored as a six- 
teen-bit unsigned integer during processing and is 
converted to an eight-bit unsigned integer substantially 
immediately prior to display of the pixel. Such format 
conversion of each band of a pixel is generally referred 
to as recasting the pixel. 

Recasting conventionally requires (i) loading from 
the memory of a computer a single pixel or a single 
band of a pixel at a time, 00 converting the pixel or the 
band of the pixel to a display format, and (iii) storing the 
converted pixel of band of a pixel. Graphical images 
commonly have approximately one thousand rows and 
approximately one thousand columns of pixels, i.e., 
approximately one million pixels, and color graphical 
images typically include four bands per pixel. Therefore, 
recasting by such conventional techniques typically 
involves approximately four million load operations and 
approximately four million store operations. In addition, 
the recasting of a pixel or a band of a pixel typically 
requires at least one computer instruction per pixel or 
per band of each pixel. Therefore, approximately 
another four million computer instructions are required 
to recasting each band of each pixel of a typical graphi- 
cal image. 

Processing of graphical images typically requires 
substantial processing resources. Requiring substantial 
processing resources to recast the pixels of a graphical 
image into a display format only adds to the processing 
resources required to render and display a graphical 
image. Because of the substantial computer system 



resources required for such graphical image recasting, 
a need persists in the industry for ever increasing effi- 
ciency in recasting of pixels or bands of pixels of graph- 
ical images from a high-precision processing format to a 
space-efficient display format 

SUMMARY OF THE INVENTION 

In accordance with the present invention, data com- 
ponents in a first, processing format, each of which 
includes a selected portion which represents the data 
component in a second, display format are merged to 
form an interleaved data word in which the selected por- 
tions of data components are grouped. For example, 
two pixel components, which are represented in a two- 
byte format in which the least significant byte represents 
each pixel component in a one-byte format, are merged 
to form a four-byte interleaved word in which the first two 
bytes are the most significant bytes of the pixel compo- 
nents in the two-byte format and in which the next two 
bytes are the least significant bytes of the pixel compo- 
nents in the two-byte format Since the least significant 
bytes of the pixel components in the two-byte format are 
equivalent to the two pixel components represented in 
the one-byte format, the two pixel components are 
effectively converted to a two-byte word in which each 
pixel component is represented in the one-byte format. 

Further in accordance with the present invention, a 
merge computer instruction is capable of interleaving 
respective bytes of two four-byte words and is used 
once to group most significant bytes and least signifi- 
cant bytes of first and second pixel components repre- 
sented in a two-byte format and to group most 
significant bytes and least significant bytes of third and 
fourth pixel components represented in the two-byte for- 
mat and a second time to group the most significant 
bytes of the first, second, third, and fourth pixel compo- 
nents and to group the least significant bytes of the first, 
second, third, and fourth pixel components. The least 
significant bytes of the first, second, third, and fourth 
pixel components represent the first, second, third, and 
fourth pixel components in a one-byte format and are 
stored as the respective pixel components in the one- 
byte format. Thus, four pixel components are converted 
from a two-byte format to a one-byte format using only 
two computer instructions. 

Further in accordance with the present invention, 
eight contiguous bytes can be accessed in a single read 
computer instruction or a single write computer instruc- 
tion. Accordingly, two read computer instructions 
retrieve eight pixel components each of which are repre- 
sented in a two-byte format. The eight pixel components 
are converted to a one-byte format using four merge 
computer instructions and are stored in memory using a 
single eight-byte write computer instruction. Accord- 
ingly, a four-band graphical image which includes one 
million pixels can be converted from a two-byte process- 
ing format to a one-byte display format using one million 
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read computer instructions, one-half million merge com- 
puter instructions, and one-haH million write computer 
instructions. Each write computer instruction can 
require an additional move computer instruction to form 
eight contiguous bytes of pixel data in an appropriate 
form for storage. The present invention therefore repre- 
sents a significant improvement of conventional tech- 
niques. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a computer system 
which includes an image processor which recasts 
graphical image data in accordance with the present 
invention. 

Figure 2 is a logic flow diagram illustrating the 
recasting of graphical image data by the image proces- 
sor of Figure 1 in accordance with the present invention. 

Figure 3 is a block diagram illustrating merge oper- 
ations used by the image processor of Figure 1 to recast 
graphical image data in accordance with the present 
invention. 

Figure 4 is a block diagram illustrating a merge 
operation performed by a computer processor of Figure 
1. 

. Figure 5 is a block diagram of the computer proces- 
sor of Figure 1 in greater detail. 

DETAILED DESCRIPTION 

In accordance with the present invention, data com- 
ponents in a first, processing format, each of which 
includes a selected portion which represents the data 
component in a second, display format, are merged to 
form an interleaved data word in which the selected por- 
tions of data components are grouped. For example, 
two pixel components, which are represented in a two- 
byte format in which the least significant byte represents 
each pixel component in a one-byte format are merged 
to form a four-byte interleaved word in which the first two 
bytes are the most significant bytes of the pixel compo- 
nents in the two-byte format and in which the next two 
bytes are the least significant bytes of the pixel compo- 
nents in the two-byte format. Since the least significant 
bytes of the pixel components in the two-byte format are 
equivalent to the two pixel components represented in 
the one-byte format, the two pixel components are 
effectively converted to a two-byte word in which each 
pixel component is represented in the one-byte format. 

Hardware Components of the Image Processing Sys- 
tem 

To facilitate appreciation of the present invention, 
the hardware components of the recasting system are 
briefly described. Computer system 100 (Figure 1) 
includes a processor 102 and memory 104 which is 
coupled to processor 102 through a bus 106. Processor 



102 fetches from memory 104 computer instructions 
and executes the fetched computer instructions. Proc- 
essor 1 02 also reads data from and writes data to mem- 
ory 104 and sends data and control signals through bus 
5 106 to one or more computer display devices 120 in 
accordance with fetched and executed computer 
instructions. Processor 102 is described in greater 
detail below. 

Memory 104 can include any type of computer 

w memory and can include, without limitation, randomly 
accessible memory (RAM), read-only memory (ROM), 
and storage devices which include storage media such 
as magnetic and/or optical disks. Memory 104 includes 
an image processor 110, which is a computer process 

t5 executing within processor 102 from memory 104. A 
computer process is a collection of computer instruc- 
tions and data which collectively define a task per- 
formed by computer system 100. As described more 
completely below, image processor 110 (i) reads pixels 

20 in a processing format from processing buffer 112, (ii) 
recasts the pixels in the processing format to pixels in a 
display format, and (iii) stores the pixels in the display 
format in display buffer 114. 

Processing buffer 112 and display buffer 114 are 

ss stored in memory 104. Processing buffers 112 store 
data representing pixels of a graphical image in a 
processing format. In one embodiment, the processing 
format includes a sixteen-bit unsigned integer to repre- 
sent each band of each pixel. For example, if the graph- 

30 ical image represented by processing buffer 112 is a 
single-band grayscale graphical image, each pixel of 
the graphical image is represented by a single sixteen- 
bit unsigned integer. Similarly, if the graphical image 
represented by processing buffer 112 is a four-band 

35 color graphical image whose bands are alpha, blue, 
green, and red, each pixel of the graphical image is rep- 
resented by a four contiguous sixteen-bit unsigned inte- 
gers which represent alpha, blue, green, and red 
components of the pixel. 

40 Display buffer 114 can be any graphical image 
buffer used in graphical image processing. For example, 
display buffer 1 14 can be a Z buffer which is used in a 
conventional manner to remove hidden surfaces from a 
rendered graphical image. Alternatively, display buffer 

45 1 14 can be a frame buffer whose contents are immedi- 
ately displayed in one of computer display devices 1 20. 
Each of computer display devices 120 can be any type 
of computer display device including without limitation a 
printer, a cathode ray tube (CRT), a light-emitting diode 

so (LED) display, or a liquid crystal display (LCD). Each of 
computer display devices 120 receives from processor 
102 control signals and data and, in response to such 
control signals, displays the received data. Computer 
display devices 120. and the control thereof by proces- 

55 sor 102, are conventional. 

The display format is a format of the data which is 
suitable for receipt and display of the data by one or 
more of computer display devices 120. In one embodi- 
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merit, the display format includes an eight-bit unsigned 
integer to represent each band of each pixel. For exam- 
ple, if the graphical image represented by display buffer 
114 is a single-band grayscale graphical image, each 
pixel of the graphical image is represented by a single 5 
eight-bit unsigned integer. Similarly, if the graphical 
image represented by display buffer 114 is a four-band 
color graphical image whose bands are alpha, blue, 
green, and red, each pixel of the graphical image is rep- 
resented by a four contiguous eight-bit unsigned inte- w 
gers which represent alpha, blue, green, and red 
components of the pixel. 

The recasting of pixels from the processing format 
in processing buffer 1 1 2 to the display format in display 
buffer 1 1 4 by image processor 1 1 0 is illustrated as logic 15 
flow diagram 200 (Figure 2). Processing according to 
logic flow diagram 200 begins with loop step 202. Loop 
step 202 and next step 216 define a loop in which image 
processor 1 10 (Figure 1) processes each band of each 
pixel of processing buffer 1 12 according to steps 204- 20 
214. Eight pixel components represented in processing 
buffer 1 12 are processed in a single iteration of the loop 
defined by loop step 202 and next step 216. For exam- 
ple, if the graphical image represented in processing 
buffer 1 12 is a single-band greyscale graphical image, x 
eight pixels are processed in a single iteration of the 
loop defined by loop step 202 and next step 216. On the 
other hand, if the graphical image represented in 
processing buffer 112 is a four-band color graphical 
image, eight pixel components which collective repre- 30 
sent two pixels are processed in a single iteration of the 
loop defined by loop step 202 and next step 216. Eight 
components are processed in each iteration of the loop 
defined by steps 202 and 216 in this illustrative embod- 
iment because the largest single write operation which 35 
can be performed by processor 102 (Figure 1) can write 
eight components in the display format to display buffer 
1 14 at once. For each eight of the components of tiie 
pixels of processing buffer 112, processing transfers 
from loop step 202 to step 204. -w 

In step 204, image processor 110 (Figure 1) reads 
eight pixel components in the processing format from 
processing buffer 112. Processor 102 performs a read 
operation in which sixteen contiguous bytes of data can 
be read from memory 104. Image processor 110 « 
invokes the read operation and causes processor 1 02 to 
perform a data alignment operation which shifts the 
read data such that the byte representing the first of the 
eight pixel components of processing buffer 1 12 to be 
processed according to the current iteration of the loop so 
defined by loop step 202 (Figure 2) and next step 216 is 
aligned on an eight-byte boundary. The first eight bytes 
of the aligned data represent four pixel components in 
the processing format, e.g., four pixel components rep- 
resented by sixteen-bit unsigned integers. The second 55 
four pixel components processed in the current iteration 
of the loop defined by steps 202 and 216 are read from 
processing buffer 1 12 in a second read operation and a 



second, corresponding data alignment operation. 

In a preferred embodiment, image processor 110 
(Figure 1) determines whether the first sixteen bytes of 
data read in step 204 (Figure 2) are already aligned on 
an eight-byte boundary prior performing the data align- 
ment operation. If the sixteen bytes of data are already 
so aligned, image processor 110 (Figure 1) does not 
perform the data alignment operation and the data read 
in a single read operation represents all eight pixel com- 
ponents. 

While data representing eight pixel components in 
the processing format are retrieved substantially simul- 
taneously, data representing four pixel components are 
converted from the processing format to the display for- 
mat substantially simultaneously. Thus, eight contigu- 
ous bytes representing the first four pixel components 
read from processing buffer 1 12 are stored in data dou- 
ble word 302 (Figure 3) of image processor 110 (Figure 
1). Data double word 302 (Figure 3) includes eight par- 
titioned bytes HO, L0. H1 , L1 , H2, L2, H3, and L3. Bytes 
HO and L0 represent most significant and least signifi- 
cant bytes of the first pixel component. Similarly, bytes 
HI and L1 represent most significant and least signifi- 
cant bytes of the second pixel component; bytes H2 and 
L2 represent most significant and least significant bytes 
of the third pixel component; and bytes H3 and L3 rep- 
resent most significant and least significant bytes of the 
fourth pixel component. In data double word 302, each 
of the four pixel components are processed such that 
the least significant byte of each pixel component in 
processing format is equivalent to the same pixel com- 
ponent in display format. In one embodiment, process- 
ing of pixel components while stored in processing 
buffer 112 (Figure 1) scales the pixel components such 
that the least significant portion of each pixel compo- 
nent represents the pixel component in the display for- 
mat Since processing of pixel components typically 
involves scaling pixel components, the scale factor can 
be adjusted such that the result of such processing is a 
pixel component who least significant portion accurately 
represents the pixel component in the display format. In 
this illustrative embodiment, pixel components are proc- 
essed in the processing format of sixteen-bit unsigned 
integers but are scaled during processing to have a 
value in the range of zero to 255 which is represented 
by the least significant eight bits of the pixel component 
As a result, the most significant portion of the pixel com- 
ponent in processing format, e.g., the eight most signif- 
icant bits in this illustrative embodiment, are zero. 

In an alternative embodiment partitioned arithmetic 
operations are performed by processor 102 (Figure 1) 
on data double word 302 (Figure 3) to scale each of the 
four pixel components represented in data double word 
302 substantially simultaneously such that the least sig- 
nificant portion of each of the pixel components repre- 
sents the pixel component in the display format. Such 
partitioned operations are described more complete, for 
example, in (i) United States patent application serial 
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number 08/236,572 by Timothy J. Van Hook, Leslie 
Dean Kohn, and Robert \bng, filed April 29, 1994 and 
entitled "A Central Processing Unit with Integrated 
Graphics Functions" (the '572 application) and (ii) 
United States patent application serial number 5 
08^398,111 by Chang-Guo Zhou and Daniel S. Rice, 
filed March 3, 1995 and entitled "Color Format Conver- 
sion in a Parallel Processor" (the '1 1 1 application), both 
of which are incorporated in their entirety herein by ref- 
erence. 10 

In step 204 (Figure 2), image processor 1 1 0 (Figure 
1) stores the second four pixel components in a data 
double word 31 2 (Figure 3) in a directly analogous man- 
ner to that described above with respect to data double 
word 302. Processing transfers from step 204 (Figure 2) is 
to step 206. 

In step 206, image processor 110 (Figure 1) 
merges bytes HO (Figure 3), L0, H1 , and L1 with bytes 
H2, L2, H3, and L3 using a PMERGE operation 306 
which is performed by processor 102 (Figure 1) and is 20 
illustrated in Figure 4. Data word 402 is 32-bits in length 
and includes four partitioned bytes 402A-D. Similarly, 
data word 404 is 32-bits in length and includes four par- 
titioned bytes 404A-D. The PMERGE operation inter- 
leaves respective bytes of data words 402 and 404 into 25 
a double data word 406 as shown. Double data word 
406 is 64 bits in length and includes eight partitioned 
bytes 406A-H. The result of PMERGE operation 304 
(Figure 3) is data double word 306 which is 64-bits in 
length and whose eight partitioned bytes have the fol- 30 
lowing values: HO, H2, L0, 12, H1, H3, L1, and L3. 
Processing transfers from step 206 (Figure 2) to step 
208. 

In step 208, image processor 110 (Figure 1) 
merges upper four bytes 306H (Figure 3) of data double 35 
word 306 and lower four bytes 306L of data double word 
306 using a PMERGE operation 308, which is directly 
analogous to PMERGE operation 304 described above. 
The result of PMERGE operation 308 is double data 
word 3 1 0 which is 64-bits in length and whose eight par- 40 
titioned bytes have the following values: HO, H1 , H2, H3, 
L0, L1 , L2, and L3. Processing transfers from step 208 
(Figure 2) to step 210. 

In step 210, image processor 110 (Figure 1) 
merges the second four pixel components stored in data 45 
double word 312 using a PMERGE operation 314 in a 
directly analogous manner to that described above with 
respect to step 206 (Figure 2) to produce data double 
word 316 whose eight partitioned bytes are H4 (Figure 
3), H6, L4, L6, H5, H7, L5, and L7. Processing transfers so 
to step 212 (Figure 2) in which image processor 110 
(Figure 1) merges upper four bytes 316H (Figure 3) and 
lower four bytes 31 6L of data double word 316 repre- 
senting the second four pixel components in a directly 
analogous manner to that described above with respect 55 
to step 208 (Figure 2). The result of PMERGE operation 
318 (Figure 3) is double data word 320 which is 64-bits 
in lenglh and whose eight partitioned bytes have the fol- 



lowing values: H5, H6. H6, H7. L4, L5, L6, and L7. 

As described above, the least significant byte of 
each of the pixel components in the processing format 
accurately represents the pixel component in the dis- 
play format. Since bytes L0, LI, 12, and L3 are the least 
significant bytes of the first four pixel components 
retrieved in step 204 (Figure 2), bytes L0 (Figure 3), L1 , 
L2, and L3 accurately represent the first four pixel com- 
ponents in the display format. Similarly, bytes L4, L5, 
L6, and L7 are the least significant bytes of the second 
four pixel components retrieved in step 204 (Figure 2) 
and therefore accurately represent the second four pixel 
components in the display format. In step 214. image 
processor 110 (Figure 1) writes to display buffer 114 
lower four bytes 31 0L (Figure 3) of data double word 
310 and lower four bytes 320L of data double word 320, 
which collectively form data double word 322 whose 
eight partitioned bytes have the values L0, L1 , L2, L3, 
L4, L5, L6, and L7. In one embodiment, image proces- 
sor 1 1 0 (Figure 1 ) combines lower four bytes 31 0L (Fig- 
ure 3) of data double word 310 and lower four bytes 
320L of data double word 320 to form data double word 
322 prior to writing data double word 322 in a single 
computer prior to writing data double word 322 to dis- 
play buffer 1 1 4 (Figure 1 ). 

Thus, eight pixel components are converted from a 
processing format to a display format using only two 
read operations and a single write operation. In addi- 
tion, four pixel components are converted from the 
processing format to the display format in only two 
PMERGE operations. Accordingly, converting one mil- 
lion four-band color pixels in processing format in 
processing buffer 1 12 to display format in display buffer 
114 using only one million read operations, 500,000 
write operations, and 500,000 PMERGE operations. By 
contrast, conventional conversion techniques typically 
require four million read operations, four million write 
operations, and at least four million operations to con- 
vert each pixel component. Therefore, the present 
invention represents a significant improvement over 
conventional graphical image format conversion tech- 
niques. 

As described above, storage of pixels in display 
buffer 1 1 4 can result immediately or indirectly in display 
of such pixels in one or more of computer display 
devices 120. From step 214 (Figure 2), processing 
transfers through next step 216 to loop step 202 in 
which the next eight pixel components stored in 
processing buffer 1 12 are processed according to steps 
204-214. Once all pixel components stored in process- 
ing buffer 112 have been processed according to the 
loop of loop step 202 and next step 216, processing 
according to logic flow diagram 200 completes. 

While it is generally described that all pixel compo- 
nents stored in processing buffer 112 (Figure 1) are 
processed, eight pixels per iteration of the loop of loop 
step 202 (Figure 2) and next step 216, some buffers do 
not necessarily store pixels of sequential scanlines con- 
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tiguously. Therefore, in a preferred embodiment, image 
processor 110 (Figure 1) processes in each iteration of 
the loop of loop step 202 (Figure 2) and next step 216 
eight pixel components of a particular scanline stored 
within processing buffer 112 (Figure 1). In this preferred 
embodiment, image processor 110 processes each 
scanline of processing buffer 1 12 in sequence. 

It is appreciated thai scanlines of a particular 
graphical image represented by processing buffer 112 
sometimes has a number of pixel components which is 
not evenly divisible by eight. In such circumstances, 
image processor 1 1 0 processes one, two, three, four, 
five, six, or seven pixel components stored within 
processing buffer 112 in the manner described above 
with respect to steps 204-214 (Figure 2) while ignoring 
excess bytes of data double words 302 (Figure 3), 306, 
310, 312, 316, 320, and 322. For example, rf scanlines 
of a graphical image represented within processing 
buffer 1 1 2 include a number of pixel components which 
is one more than an integer multiple of eight, one pixel 
component stored within processing buffer 1 1 2 is proc- 
essed in the following manner. 

Image processor 1 10 reads one pixel component 
from processing buffer 112 and stores the read pixel 
component as bytes HO and LO in data double word 302 
(Figure 3). Bytes H1 , L1 , H2, L2, H3, L3, H4, L4, H5, L5, 
H6, L6, H7, and L7 are ignored. PMERGE operations 
304 and 308 are executed in the manner described 
above. As a result, byte L0 is the most significant byte of 
data double word 322 and is stored in display buffer 1 14 
(Figure 1) by image processor 110. Bytes L2-7 (Figure 
3) are data double word 322 are ignored. 

Processor 102 

Processor 102 is shown in greater detail in Figure 5 
and is described briefly herein and more completely in 
the '572 application. Processor 102 includes a prefetch 
and dispatch unit (PDU) 46, an instruction cache 40, an 
integer execution unit (IEU) 30, an integer register file 
36. a floating point unit (FPU) 26, a floating point regis- 
ter file 38. and a graphics execution unit (GRU) 28, cou- 
pled to each other as shown. Additionally, processor 
102 includes two memory management units (IMMU & 
DMMU) 44a-44b, and a load and store unit (LSU) 48, 
which in turn includes data cache 120, coupled to each 
other and the previously described elements as shown. 
Together, the components of processor 102 fetch, dis- 
patch, execute, and save execution results of computer 
instructions, e.g.. computer instructions of image proc- 
essor 110 (Figure 1), in a pipelined manner. 

PDU 46 (Figure 5) fetches instructions from mem- 
ory 104 (Figure 1) and dispatches the instructions to 
IEU 30 (Figure 5), FPU 26, GRU 28. and LSU 48 
accordingly. Prefetched instructions are stored in 
instruction cache 40. IEU 30, FPU 26, and GRU 28 per- 
form integer, floating point, and graphics operations, 
respectively. In general, the integer operands and 



results are stored in integer register file 36, whereas the 
floating point and graphics operands and results are 
stored in floating point register file 38. Additionally, IEU 
30 also performs a number of graphics operations, and 

5 appends address space identifiers (ASI) to addresses 
of load/store instructions for LSU 48, identifying the 
address spaces being accessed. LSU 48 generates 
addresses for all load and store operations. The LSU 48 
also supports a number of load and store operations, 

io specifically designed for graphics data. Memory refer- 
ences are made in virtual addresses. MMUs 44a-44b 
map virtual addresses to physical addresses. 

PDU 46. IEU 30, FPU 26. integer and floating point 
register files 36 and 38, MMUs 44a-44b, and LSU 48 

15 can be coupled to one another in any of a number of 
configurations as described more completely in the '572 
application. As described more completely in the '572 
application with respect to Figures 8a-8d thereof GRU 
28 performs a number of distinct partitioned multiplica- 

20 tion operations and partitioned addition operations. Var- 
ious partitioned operations used by image processor 
110 (Figure 1) are described more completely below. 

As described above, processor 102 includes four 
(4) separate processing units, i.e., LSU 48, IEU 30, FPU 

25 26, and GRU 28. Each of these processing units is 
described more completely in the '572 application. 
These processing units operate in parallel and can each 
execute a respective computer instruction while others 
of the processing units executes a different computer 

30 instruction. GRU 28 executes the PMERGE operations 
described above. 

In one embodiment, processor 102 is the 
UltraSPARC processor and computer system 100 (Fig- 
ure 1) is the UltraSPARCstation, both of which are avail - 

35 able from Sun Microsystems, Inc. of Mountain View, 
California. Sun, Sun Microsystems, and the Sun Logo 
are trademarks or registered trademarks of Sun Micro- 
systems. Inc. in the United States and other countries. 
All SPARC trademarks are used under license and are 

40 trademarks of SPARC International, Inc. in the United 
States and other countries. Products bearing SPARC 
trademarks are based upon an architecture developed 
by Sun Microsystems, Inc. 

45 Claims 

1. A method for converting a first data word which 
includes at least two data components in a first data 
format to a second data word which includes the at 
so least two data components in a second data format, 
the method comprising: 

interleaving (i) a first portion of the first data 
word which includes a first one of the at least 
55 two data components and (ii) a second portion 

of the first data word which includes a second 
one of the at least two data components to form 
an interleaved data word which includes a 
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selected portion of the first data component 
which is adjacent to a selected portion of the 
second data component, wherein the selected 
portions of the first and second data compo- 
nents represent the first and second data com- 
ponents in the second data format; and 
including the selected portions of the first and 
second data components from the interleaved 
data word in the second data word. 

2. The method of Claim 1 wherein the step of inter- 
leaving is performed by a computer processor in a 
single instruction cycle of the computer processor. 

3. The method of Claim 1 further comprising: 

reading the first data word from a buffer stored 
in a memory of the computer. 

4. The method of Claim 1 wherein the selected portion 
of the first data component is a least significant por- 
tion of the first data word; and 

further wherein the selected portion of the sec- 
ond data component is a least significant por- 
tion of the first second data word. 

5. The method of Claim 1 further comprising: 

storing the second data word in a destination 
buffer in a memory of a computer. 

6. The method of Claim 1 wherein the first portion of 
the first data word further includes a third one of the 
at least two data components; 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data component and a selected portion of the 
fourth data component, the selected portions of 
the third and fourth data components being 
adjacent to one another within the interleaved 
word; 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the method further com- 
prises: 

(i) interleaving a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, second, third, and fourth data 



components are substantially contiguous; 
and 

(e) further wherein the step of include further 
5 comprises: 

(i) including the selected portions of the 
third and fourth data components in the 
second data word. 

w 

7. A computer program product which includes a com- 
puter usable medium having computable readable 
code embodied therein for converting a first data 
word which includes at least two data components 

rs in a first data format to a second data word which 
includes the at least two data components in a sec- 
ond data format, the computer readable code com- 
prising: 

so a merge module configured to interleave (i) a 

first portion of the first data word which 
includes a first one of the at least two data com- 
ponents and (ii) a second portion of the first 
data word which includes a second one of the 

25 at least two data components to form an inter- 

leaved data word which includes a selected 
portion of the first data component adjacent to 
a selected portion of the second data compo- 
nent wherein the selected portions of the first 

so and second data components represent the 

first and second data components in the sec- 
ond data format; and 

a data selection module operatively coupled to 
the merge module and configured to include 
35 the selected portions of the first and second 

data components from the interleaved data 
word in the second data word. 

8. The computer program product of Claim 7 wherein 
40 the merge module is further configured to inter- 
leave the first and second portions of the first data 
word in a single instruction cycle of a computer 
processor. 

45 9. The computer program product of Claim 7 wherein 
the computer readable code further comprises: 

a data component retrieving module opera- 
tively coupled to the merge module and config- 
50 ured to read the first data word from a buffer 

stored in a memory of a computer. 

10. The computer program product of Claim 7 wherein 
the selected portion of the first data component is a 
55 least significant portion of the first data word ; and 

further wherein the selected portion of the sec- 
ond data component is a least significant por- 
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tion of the first second data word. 

1 1 . The computer program product of Claim 7 wherein 
the computer readable code further comprises: 

5 

a data component storage module operatively 
coupled to the data selection module and con- 
figured to store the second data word in a des- 
tination buffer in a memory of a computer. 

w 

12. The computer program product of Claim 7 wherein 
the first portion of the first data word further 
includes a third one of the at least two data compo- 
nents; 

15 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third io 
data component and a selected portion of the 
fourth data component, the selected portions of 
the third and fourth data components being 
adjacent to one another within the interleaved 
word; 25 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the computer readable code so 
further comprises: 



ponerrts and (ii) a second portion of the first 
data word which includes a second one of the 
at least two data components to form an inter- 
leaved data word which includes a selected 
portion of the first data component adjacent to 
a selected portion of the second data compo- 
nent, wherein the selected portions of the first 
and second data components represent the 
first and second data components in the sec- 
ond data format; and 

a data selection module operatively coupled to 
the merge module and configured to include 
the selected portions of the first and second 
data components from the interleaved data 
word in the second data word. 

14. The data recaster of Claim 13 wherein the merge 
module is further configured to interleave the first 
and second portions of the first data word in a sin- 
gle instruction cycle of a computer processor. 

15. The data recaster of Claim 13 further comprising: 

a data component retrieving module opera- 
tively coupled to the merge module and config- 
ured to read the first data word from a buffer 
stored in a memory of a computer. 

16. The data recaster of Claim 13 wherein the selected 
portion of the first data component is a least signifi- 
cant portion of the first data word; and 



13. 



(i) a second merge module different from 
the first-mentioned merge module, opera- 
tively coupled to the first merge module 
and the data selection module, and config- 
ured to interleave a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, second, third, and fourth data 
components are substantially contiguous; 
and 

(e) further wherein the data selection module is 
further configured to include the selected por- 
tions of the third and fourth data components in 
the second data word. 

A data recaster for converting a first data word 
which includes at least two data components in a 
first data format to a second data word which 
includes the at least two data components in a sec- 
ond data format, the data recaster comprising: 

a merge module configured to interleave (i) a 
first portion of the first data word which 
includes a first one of the at least two data corn- 



further wherein the selected portion of the sec- 
ond data component is a least significant por- 
35 tion of the first second data word. 

17. The data recaster of Claim 13 further comprising: 

a data component storage module operatively 
40 coupled to the data selection module and con- 

figured to store the second data word in a des- 
tination buffer in a memory of a computer. 

18. The data recaster of Claim 13 wherein the first por- 
45 tion of the first data word further includes a third 

one of the at least two data components; 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 

so the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data component and a selected portion of the 
fourth data component, the selected portions of 

ss the third and fourth data components being 

adjacent to one another within the interleaved 
word, 

(c) further wherein the selected portions of the 
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third and fourth data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the data recaster further 
comprises: 5 

(i) a second merge module different from 
the first-mentioned merge module, opera- 
tively coupled to the first merge module 
and the data selection module, and config- io 
ured to interleave a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, second, third, and fourth data is 
components are substantially contiguous; 
and 



(e) further wherein the data selection module is 
further configured to include the selected por- 20 
tiorts of the third and fourth data components in 
the second data word. 

19. A computer system comprising: 

25 

a memory; 

a computer processor operatively coupled to 
the memory; and 

a data recaster stored in the memory and 
which includes at least one computer instruc- 30 
tions which are executed within the computer 
processor to convert a first data word which 
includes at least two data components in a first 
data format to a second data word which 
includes the at least two data components in a 35 
second data format, the data recaster compris- 
ing: 

a merge module configured to interleave (i) 
a first portion of the first data word which 40 
includes a first one of the at least two data 
components and 00 a second portion of 
the first data word which includes a second 
one of the at least two data components to 
form an interleaved data word which 45 
includes a selected portion of the first data 
component adjacent to a selected portion 
of the second data component, wherein 
the selected portions of the first and sec- 
ond data components represent the first so 
and second data components in the sec- 
ond data format; and 

a data selection module operatively cou- 
pled to the merge module and configured 
to include the selected portions of the first 55 
and second data components from the 
interleaved data word in the second data 
word. 



20. The computer system of Claim 19 wherein the 
merge module is further configured to interleave 
the first and second portions of the first data word in 
a single instruction cycle of the computer proces- 
sor. 

21 . The computer system of Claim 1 9 wherein the data 
recaster further comprises: 

a data component retrieving module opera- 
tively coupled to the merge module and config- 
ured to read the first data word from a buffer 
stored in the memory. 

22. The computer system of Claim 19 wherein the 
selected portion of the first data component is a 
least significant portion of the first data word; and 

further wherein the selected portion of the sec- 
ond data component is a least significant por- 
tion of the first second data word. 

23. The computer system of Claim 1 9 wherein the data 
recaster further comprises: 

a data component storage module operatively 
coupled to the data selection module and con- 
figured to store the second data word in a des- 
tination buffer in the memory. 

24. The computer system of Claim 1 9 wherein the first 
portion of the first data word further includes a third 
one of the at least two data components; 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data component and a selected portion of the 
fourth data component the selected portions of 
the third and fourth data components being 
adjacent to one another within the interleaved 
word; 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the data recaster further 
comprises: 

(i) a second merge module different from 
the first-mentioned merge module, opera- 
tively coupled to the first merge module 
and the data selection module, and config- 
ured to interleave a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
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leaved word in which the selected portions 
of the first, second, third, and fourth data 
components are substantially contiguous; 
and 

5 

(e) further wherein the data selection module is 
further configured to include the selected por- 
tions of the third and fourth data components in 
the second data word. 

m 

25. A system for distributing code (i) which is stored on 
a computer-readable medium, (ii) which is executa- 
ble by a computer, and (iii) which includes at least 
one module, each of which in turn is configured to 
carry out at least one function to be executed by the 15 
computer, the at least one function including con- 
verting a first data word which includes at least two 
data components in a first data format to a second 
data word which includes the at least two data com- 
ponents in a second data format, the system com- so 
prising: 

a merge module configured to interleave (i) a 
first portion of the first data word which 
includes a first one of the at least two data com- x 
ponerrts and (ii) a second portion of the first 
data word which includes a second one of the 
at least two data components to form an inter- 
leaved data word which includes a selected 
portion of the first data component adjacent to 30 
a selected portion of the second data compo- 
nent, wherein the selected portions of the first 
and second data components represent the 
first and second data components in the sec- 
ond data format; and 35 
a data selection module operatively coupled to 
the merge module and configured to include 
the selected portions of the first and second 
data components from the interleaved data 
word i n the second data word . 

26. The system of Claim 25 wherein the merge module 
is further configured to interleave the first and sec- 
ond portions of the first data word in a single 
instruction cycle of a computer processor. 45 

27. The system of Claim 25 further comprising: 

a data component retrieving module opera- 
tively coupled to the merge module and conf ig- so 
ured to read the first data word from a buffer 
stored in a memory of a computer. 

28. The system of Claim 25 wherein the selected por- 
tion of the first data component is a least significant ss 
portion of the first data word; and 

further wherein the selected portion of the sec- 



ond data component is a least significant por- 
tion of the first second data word. 

29. The system of Claim 25 further comprising: 

a data component storage module operatively 
coupled to the data selection module and con- 
figured to store the second data word in a des- 
tination buffer in a memory of a computer. 

30. The system of Claim 25 wherein the first portion of 
the first data word further includes a third one of the 
at least two data components; 

(a) further wherein the second portion of the 
first data word further includes a fourth one of 
the at least two data components; 

(b) further wherein the interleaved data word 
further includes a selected portion of the third 
data component and a selected portion of the 
fourth data component, the selected portions of 
the third and fourth data components being 
adjacent to one another within the interleaved 
word; 

(c) further wherein the selected portions of the 
third and fourth data components represent the 
third and fourth data components in the second 
data format; 

(d) further wherein the data recaster further 
comprises: 

(i) a second merge module different from 
the first-mentioned merge module, opera- 
tively coupled to the first merge module 
and the data selection module, and config- 
ured to interleave a first portion of the inter- 
leaved word with a second portion of the 
interleaved word to form a second inter- 
leaved word in which the selected portions 
of the first, second, third, and fourth data 
components are substantially contiguous; 
and 

(e) further wherein the data selection module is 
further configured to include the selected por- 
tions of the third and fourth data components in 
the second data word. 
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